hw1 (Score: 9.0 / 11.0)

  1. Test cell (Score: 1.0 / 1.0)
  2. Test cell (Score: 1.0 / 1.0)
  3. Test cell (Score: 1.0 / 1.0)
  4. Written response (Score: 0.5 / 1.0)
  5. Comment
  6. Written response (Score: 0.0 / 1.0)
  7. Written response (Score: 1.0 / 1.0)
  8. Written response (Score: 0.5 / 1.0)
  9. Comment
  10. Test cell (Score: 1.0 / 1.0)
  11. Test cell (Score: 1.0 / 1.0)
  12. Written response (Score: 1.0 / 1.0)
  13. Test cell (Score: 1.0 / 1.0)

Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [1]:
NAME = "Vincent Chen"
COLLABORATORS = ""

HW1: Setup, Prerequisites, and Image Classification

Course Policies

Here are some important course policies. These are also located at http://www.ds100.org/sp18/.

Collaboration Policy

Data science is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually. If you do discuss the assignments with others please include their names at the top of your solution.

This assignment

This part goes over prerequisites to taking DS100.

  • How to set up Jupyter on your own computer.
  • How to check out and submit assignments for this class.
  • Python basics, like defining functions.
  • How to use the numpy library to compute with arrays of numbers.
  • Partial derivatives and matrix expressions

Due Date

This assignment is due at 11:59pm Tuesday, January 30th. Instructions for submission are at the bottom of this assignment.

Part 1: Prerequisites

Setup

If you haven't already, go through the instructions at http://www.ds100.org/sp18/setup.

The instructions for submission are at the end of this notebook.

You should now be able to open this notebook in Jupyter and run cells.

Running a Cell

Try running the following cell. If you unfamiliar with Jupyter Notebooks consider skimming this tutorial or selecting Help -> User Interface Tour in the menu above.

In [2]:
print("Hello World!")
Hello World!

Even if you are familiar with Jupyter, we strongly encourage you to become proficient with keyboard shortcuts (this will save you time in the future). To learn about keyboard shortcuts go to Help -> Keyboard Shortcuts in the menu above.

Here are a few we like:

  1. ctrl+return : Evaluate the current cell
  2. shift+return: Evaluate the current cell and move to the next
  3. esc : command mode (required before using any of the commands below)
  4. a : create a cell above
  5. b : create a cell below
  6. d : delete a cell
  7. m : convert a cell to markdown
  8. y : convert a cell to code

Testing your Setup

If you've set up your environment properly, this cell should run without problems:

In [3]:
import math
import numpy as np
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import pandas as pd
import skimage
import skimage.io
import skimage.filters

Python

Python is the main programming language we'll use in the course. We expect that you've taken CS61A or an equivalent class, so you should be able to explain the following cells. Run them and make sure you understand what is happening in each.

If this seems difficult, please review one or more of the following materials.

Mathematical Expressions

Note that the rocket icon indicates that you should just run the following cells.

In [4]:
# This is a comment.
# In Python, the ** operator performs exponentiation.
math.sqrt(math.e ** (-math.pi + 1))
Out[4]:
0.3427354792736325

Output and Printing

In [5]:
"Why didn't this line print?"

print("Hello" + ",", "world!")

"Hello, cell" + " output!"
Hello, world!
Out[5]:
'Hello, cell output!'

For Loops

In [6]:
# A for loop repeats a block of code once for each
# element in a given collection.
for i in range(5):
    if i % 2 == 0:
        print(2**i)
    else:
        print("Odd power of 2")
1
Odd power of 2
4
Odd power of 2
16

List Comprehension

In [7]:
[str(i) + " sheep." for i in range(1,5)] 
Out[7]:
['1 sheep.', '2 sheep.', '3 sheep.', '4 sheep.']
In [8]:
[i for i in range(10) if i % 2 == 0]
Out[8]:
[0, 2, 4, 6, 8]

Defining Functions

In [9]:
def add2(x):
    """This docstring explains what this function does: it adds 2 to a number."""
    return x + 2

Getting Help

In [10]:
help(add2)
Help on function add2 in module __main__:

add2(x)
    This docstring explains what this function does: it adds 2 to a number.

You can close the window at the bottom by pressing esc several times.

Passing Functions as Values

In [11]:
def makeAdder(amount):
    """Make a function that adds the given amount to a number."""
    def addAmount(x):
        return x + amount
    return addAmount

add3 = makeAdder(3)
add3(4)
Out[11]:
7
In [12]:
makeAdder(3)(4)
Out[12]:
7

Anonymous Functions and Lambdas

In [13]:
# add4 is very similar to add2, but it's been created using a lambda expression.
add4 = lambda x: x + 4
add4(5)
Out[13]:
9

Recursion

In [14]:
def fib(n):
    if n <= 1:
        return 1
    else:
        # Functions can call themselves recursively.
        return fib(n-1) + fib(n-2)

fib(6)
Out[14]:
13

Question 1

Question 1a

Write a function nums_reversed that takes in an integer n and returns a string containing the numbers 1 through n including n in reverse order, separated by spaces. For example:

>>> nums_reversed(5)
'5 4 3 2 1'

Note: The ellipsis (...) indicates something you should fill in. It doesn't necessarily imply you should replace it with only one line of code.


The code icon indicates that you should complete the following block of code.

In [15]:
Student's answer(Top)
def nums_reversed(n):
    s = ""
    for i in range(n, 1, -1):
        s += str(i) + " "
    s += "1"
    return s
    
nums_reversed(5)
Out[15]:
'5 4 3 2 1'

Test your code in the cell below.

In [16]:
Grade cell: num-reversed-tests Score: 1.0 / 1.0 (Top)
assert nums_reversed(5) == '5 4 3 2 1'
assert nums_reversed(1) == '1'

### BEGIN HIDDEN TESTS
assert nums_reversed(3) ==  '3 2 1'
### END HIDDEN TESTS

Question 1b

Write a function string_splosion that takes in a non-empty string like "Code" and returns a long string containing every prefix of the input. For example:

>>> string_splosion('Code')
'CCoCodCode'
>>> string_splosion('data!')
'ddadatdatadata!'
>>> string_splosion('hi')
'hhi'

Hint: Try to use recursion. Think about how you might answering the following two questions:

  1. [Base Case] What is the string_splosion of the empty string?
  2. [Inductive Step] If you had a string_splosion function for the first $n-1$ characters of your string how could you extend it to the $n^{th}$ character? For example, string_splosion("Cod") = "CCoCod" becomes string_splosion("Code") = "CCoCodCode".

In [17]:
Student's answer(Top)
def string_splosion(string):
    s = ""
    for i in range(1,len(string) + 1):
        s += string[:i]
    return s
        
string_splosion('fade')
Out[17]:
'ffafadfade'

Test your code in the cell below.

In [18]:
Grade cell: string-splosion-test Score: 1.0 / 1.0 (Top)
assert string_splosion('Code') == 'CCoCodCode'
assert string_splosion('fade') == 'ffafadfade'

### BEGIN HIDDEN TESTS
assert string_splosion('Kitten') == 'KKiKitKittKitteKitten'
assert string_splosion('data!') == 'ddadatdatadata!'
### END HIDDEN TESTS

Question 1c

Write a function double100 that takes in a list of integers and returns True only if the list has two 100s next to each other.

>>> double100([100, 2, 3, 100])
False
>>> double100([2, 3, 100, 100, 5])
True

In [19]:
Student's answer(Top)
def double100(nums):
    i = 0
    while i < len(nums):
        if nums[i] == 100 and nums[i + 1] == 100:
            print("True")
            return True
        i += 1
    print("False")
    return False
In [20]:
Grade cell: double100-tests Score: 1.0 / 1.0 (Top)
assert double100([3, 3, 100, 100]) == True
assert double100([5, 2, 5, 2]) == False
assert double100([4, 2, 4, 100, 100, 5]) == True
True
False
True

NumPy and Tables

The NumPy library lets us do fast, simple computing with numbers in Python.

You should be able to understand the code in the following cells. If not, review the following:

Jupyter pro-tip: Pull up the docs for any function in Jupyter by running a cell with the function name and a ? at the end:

In [21]:
np.arange?

Another Jupyter pro-tip: Pull up the docs for any function in Jupyter by typing the function name, then <Shift>-<Tab> on your keyboard. Super convenient when you forget the order of the arguments to a function. You can press <Tab> multiple tabs to expand the docs.

Try it on the function below:

In [22]:
np.linspace
np.cumprod?

You can use the tips above to help you deciper the following code.

In [23]:
# Let's take a 20-sided die...
NUM_FACES = 20

# ...and roll it 4 times
rolls = 4

# What's the probability that all 4 rolls are different? It's:
# 20/20 * 19/20 * 18/20 * 17/20
prob_diff = np.prod((NUM_FACES - np.arange(rolls))
                    / NUM_FACES)
prob_diff
Out[23]:
0.72675000000000001
In [24]:
# Let's compute that probability for 1 roll, 2 rolls, ..., 20 rolls.
# The array ys will contain:
# 
# 20/20
# 20/20 * 19/20
# 20/20 * 18/20
# ...
# 20/20 * 19/20 * ... * 1/20

xs = np.arange(20)
ys = np.cumprod((NUM_FACES - xs) / NUM_FACES)

# Python slicing works on arrays too
ys[:5]
Out[24]:
array([ 1.     ,  0.95   ,  0.855  ,  0.72675,  0.5814 ])
In [25]:
plt.plot(xs, ys, 'o-')
plt.xlabel("Num Rolls")
plt.ylabel('P(all different)')
Out[25]:
Text(0,0.5,'P(all different)')
In [26]:
# Mysterious...
mystery = np.exp(-xs ** 2 / (2 * NUM_FACES))
mystery
Out[26]:
array([  1.00000000e+00,   9.75309912e-01,   9.04837418e-01,
         7.98516219e-01,   6.70320046e-01,   5.35261429e-01,
         4.06569660e-01,   2.93757700e-01,   2.01896518e-01,
         1.31993843e-01,   8.20849986e-02,   4.85578213e-02,
         2.73237224e-02,   1.46253347e-02,   7.44658307e-03,
         3.60656314e-03,   1.66155727e-03,   7.28152539e-04,
         3.03539138e-04,   1.20362805e-04])
In [27]:
# If you're curious, this is the exponential approximation for our probability:
# https://textbook.prob140.org/ch1/Exponential_Approximation.html
plt.plot(xs, ys, 'o-', label="All Different")
plt.plot(xs, mystery, 'o-', label="Mystery")
plt.xlabel("Num Rolls")
plt.ylabel('P(all different)')
plt.legend()
Out[27]:
<matplotlib.legend.Legend at 0x7fd1650fc668>

Question 2

To test your understanding of Numpy we will work through some basic image exercises. In the process we will explore visual perception and color.

Images are 2-dimensional grids of pixels. Each pixel contains 3 values between 0 and 1 that specify how much red, green, and blue go into each pixel.

We can create images in NumPy:

In [28]:
simple_image = np.array([
    [[  0,   0, 0], [0.5, 0.5, 0.5], [1.0, 1.0, 1.0]], # Grayscale pixels
    [[1.0,   0, 0], [  0, 1.0,   0], [  0,   0, 1.0]], # Pure RGB pixels
    [[0.5, 0.5, 0], [0.5,   0, 0.5], [  0, 0.5, 0.5]], # Blend of 2 colors
])
simple_image
Out[28]:
array([[[ 0. ,  0. ,  0. ],
        [ 0.5,  0.5,  0.5],
        [ 1. ,  1. ,  1. ]],

       [[ 1. ,  0. ,  0. ],
        [ 0. ,  1. ,  0. ],
        [ 0. ,  0. ,  1. ]],

       [[ 0.5,  0.5,  0. ],
        [ 0.5,  0. ,  0.5],
        [ 0. ,  0.5,  0.5]]])

We can then use the scikit-image library to display an image:

In [29]:
# Curious how this method works? Try using skimage.io.imshow? to find out.
# Or, you can always look at the docs for the method.
skimage.io.imshow(simple_image)
plt.grid(False) # Disable matplotlib's grid lines

We can read in image files using the skimage.io.imread method.

Note that in many image formats (e.g., JPEG) image values are numbers between 0 and 255 corresponding to a byte. Therefore we divide each pixel value by 255 to obtain numbers between 0 and 1.

In [30]:
plt.figure(figsize=(20,10))

# Some image files (including .jpg files) have pixel values in between
# 0 and 255 when read. We divide by 255 to scale the values between 0 and 1:
pic = skimage.io.imread('target.jpg')/255


skimage.io.imshow(pic)
plt.grid(False) # Disable matplotlib's grid lines

Professor Gonzalez is a very amateur archer.

Question 2a

Complete the following block of code to plot the Red, Green, and Blue color channels separately. The resulting images should appear in black and white.

  • Hint: pic[:, :, 0] will slice the image to extract the red color channel. Plotting the resulting matrix will generate a black and white picture.*

In [31]:
Student's answer(Top)
plt.figure(figsize=(20,10)) 
channel_names = ["Red", "Green", "Blue"]

# Loop through index of each channel
for channel in range(3):
    # Make a subplot
    plt.subplot(1,3,channel+1)
    
    skimage.io.imshow(pic[:, :, channel])
    
    plt.grid(False)
    plt.title(channel_names[channel])
In [32]:
 

Question 2b

Surprisingly the human eye doesn't see all colors equally. To demonstrate this we will study how blurring color channels affects image appearance. First, we will try to blur each color channel individually. Complete the following block of code using the skimage.filters.gaussian blurring function (read the docs) to render a blurry black and white image for each of the color channels. You should set the standard deviation of the Gaussian blurring kernel sigma to 10.

In [32]:
Student's answer(Top)
plt.figure(figsize=(20,10))

sigma = 10

# Loop through index of each channel
for channel in range(3):
    # Make a subplot
    plt.subplot(1,3,channel+1)
    # FINISH THE CODE 
    
    skimage.filters.gaussian(pic, sigma=10, multichannel=True)
    skimage.io.imshow(pic[:, :, channel])
    
    
    plt.grid(False)
    plt.title(channel_names[channel])

Question 2c

Using the following block of code:

pic_copy = pic.copy()
pic_copy[:, :, channel] = ...
skimage.io.imshow(pic_copy)

we can replace a color channel with a different black and white image. Complete the following block of code to render three different versions of the full color image with just one of the channels blurred.

In [33]:
Student's answer(Top)
plt.figure(figsize=(20,10))

sigma = 10

# Loop through index of each channel
for channel in range(3):
    # Make a subplot
    plt.subplot(1,3,channel+1)
    
    pic_copy = pic.copy()
    pic_copy[:, :, channel] = skimage.filters.gaussian(pic, sigma=10, multichannel=True)[:,:,channel]
    skimage.io.imshow(pic_copy)
    
    plt.grid(False)
    plt.title(channel_names[channel])

Question 2d

Each image should appear slightly different. Which one is the blurriest and which is the sharpest? Write a short description of what you see in the cell below.

*This icon means you will need to write in text response in the cell below using English.

*Hint: I observe ... . On possible explanation for this is ... .

Student's answer Score: 0.5 / 1.0 (Top)

We don't necessarily see every color the same. In fact, this can be seen from the three images seen above in which some actually don't change in color too much at all. We don't quite see blue, and then red, and then we see green the most.

Multivariable Calculus and Linear Algebra

The following questions ask you to recall your knowledge of multivariable calculus and linear algebra. We will use some of the most fundamental concepts from each discipline in this class, so the following problems should at least seem familiar to you.

For the following problems, you should use LaTeX to format your answer. If you aren't familiar with LaTeX, not to worry. It's not hard to use in a Jupyter notebook. Just place your math in between dollar signs:

\$ f(x) = 2x \$ becomes $ f(x) = 2x $.

If you have a longer equation, use double dollar signs:

\$\$ \sum_{i=0}^n i^2 \$\$ becomes:

$$ \sum_{i=0}^n i^2 $$.

Here are some handy notation:

Output Latex
$$x^{a + b}$$ x^{a + b}
$$x_{a + b}$$ x_{a + b}
$$\frac{a}{b}$$ \frac{a}{b}
$$\sqrt{a + b}$$ \sqrt{a + b}
$$\{ \alpha, \beta, \gamma, \pi, \mu, \sigma^2 \}$$ \{ \alpha, \beta, \gamma, \pi, \mu, \sigma^2 \}
$$\sum_{x=1}^{100}$$ \sum_{x=1}^{100}
$$\frac{\partial}{\partial x} $$ \frac{\partial}{\partial x}
$$\begin{bmatrix} 2x + 4y \\ 4x + 6y^2 \\ \end{bmatrix}$$ \begin{bmatrix} 2x + 4y \\ 4x + 6y^2 \\ \end{bmatrix}

For more about basic LaTeX formatting, you can read this article.

If you have trouble with these topics, we suggest reviewing:

Question 3

Question 3a

Simplify the following expression:

$$ \large \ln \left( 3 e^{2 x} e^{\frac{1}{x^2}} \right) $$

*This icon means you will need to write in text response in the cell below using English + $\LaTeX$.

$$ \ln \left( 3 e^{2 x} e^{\frac{1}{x^2}} \right) = \ldots $$

Student's answer Score: 0.0 / 1.0 (Top)

$ 6x (1/x^2) $

Question 3b

Suppose we have the following scalar-valued function on $x$ and $y$:

$$ \Large f(x, y) = x^2 + 4xy + 2y^3 + e^{-3y} + \ln(2y) $$

Compute the partial derivative with respect to $x$.

$$ \frac{\partial}{\partial x} f(x,y) = ... $$

Student's answer Score: 1.0 / 1.0 (Top)

$2x + 4y$

Question 3c

Now compute the partial derivative of $f(x,y)$ with respect to $y$:

$$ \frac{\partial}{\partial y} f(x,y) = ... $$

Student's answer Score: 0.5 / 1.0 (Top)

$ 4x + 6y + -3e^(-3y)+ (1/y) $

Question 4

In this question, we'll ask you to use your linear algebra knowledge to fill in NumPy matrices. To conduct matrix multiplication in NumPy, you should write code like the following:

In [34]:
# A matrix in NumPy is simply a 2-dimensional NumPy array
matA = np.array([
    [1, 2, 3],
    [4, 5, 6],
])

matB = np.array([
    [10, 11],
    [12, 13],
    [14, 15],
])

# The notation B @ v means: compute the matrix multiplication Bv
matA @ matB
Out[34]:
array([[ 76,  82],
       [184, 199]])

You can also use the same syntax to do matrix-vector multiplication or vector dot products. Handy!

In [35]:
matA = np.array([
    [1, 2, 3],
    [4, 5, 6],
])

# A vector in NumPy is simply a 1-dimensional NumPy array
some_vec = np.array([ 10, 12, 14, ])

another_vec = np.array([ 10, 20, 30 ])

print(matA @ some_vec)
print(some_vec @ another_vec)
[ 76 184]
760

Question 4a

Joey, Deb, and Sam are shopping for fruit at Berkeley Bowl. Berkeley Bowl, true to its name, only sells fruit bowls. A fruit bowl contains some fruit and the price of a fruit bowl is the total price of all of its individual fruit.

Berkeley Bowl has apples for \$2.00, bananas \$1.00, and cantaloupes \$4.00 (expensive!). The price of each of these can be written in a vector:

$$ \vec{v} = \begin{bmatrix} 2 \\ 1 \\ 4 \\ \end{bmatrix} $$

Berkeley Bowl sells the following fruit bowls:

  1. 2 of each fruit
  2. 5 apples and 8 bananas
  3. 2 bananas and 3 cantaloupes
  4. 10 cantaloupes

Create a 2-dimensional numpy array encoding the matrix $B$ such that the matrix-vector multiplication

$$ B\vec{v} $$

evaluates to a length 4 column vector containing the price of each fruit bowl. The first entry of the result should be the cost of fruit bowl #1, the second entry the cost of fruit bowl #2, etc.

In [36]:
Student's answer(Top)
v = np.array([2,1,4])

# Fill in B
# B = ...

B = np.array([
    [2, 2, 2],
    [5, 8, 0],
    [0, 2, 3],
    [0, 0, 10],
])

# The notation B @ v means: compute the matrix multiplication Bv
B @ v
Out[36]:
array([14, 18, 14, 40])
In [37]:
Grade cell: question4-tests Score: 1.0 / 1.0 (Top)
assert np.allclose(B @ v, np.array([14, 18, 14, 40]))

Question 4b

Joey, Deb, and Sam make the following purchases:

  • Joey buys 2 fruit bowl #1's and 1 fruit bowl #2.
  • Deb buys 1 of each fruit bowl.
  • Sam buys 10 fruit bowl #4s (he really like cantaloupes).

Create a matrix $A$ such that the matrix expression

$$ AB\vec{v} $$

evaluates to a length 3 column vector containing how much each of them spent. The first entry of the result should be the total amount spent by Joey, the second entry the amount sent by Deb, etc.

In [38]:
Student's answer(Top)
A = np.array([
    [2, 1, 0, 0],
    [1, 1, 1, 1],
    [0, 0, 0, 10],
]) 

A @ B @ v 
Out[38]:
array([ 46,  86, 400])
In [39]:
Grade cell: question4b-test Score: 1.0 / 1.0 (Top)
# The final tests for this question have been hidden.

### BEGIN HIDDEN TESTS
assert np.allclose(A @ B @ v , np.array([ 46,  86, 400]))
### END HIDDEN TESTS

Question 4c

Who spent the most money?

Student's answer Score: 1.0 / 1.0 (Top)

Sam

Question 4d

Let's suppose Berkeley Bowl changes their fruit prices, but you don't know what they changed their prices to. Joey, Deb, and Sam buy the same quantity of fruit baskets and the number of fruit in each basket is the same, but now they each spent these amounts:

$$ \vec{x} = \begin{bmatrix} 80 \\ 80 \\ 100 \\ \end{bmatrix} $$

Use np.linalg.inv and the above final costs to compute the new prices for the individual fruits:

In [40]:
Student's answer(Top)
x = np.array([80,80,100])
new_v = np.linalg.inv(A@B) @ x
In [41]:
Grade cell: question4d-test Score: 1.0 / 1.0 (Top)
assert np.allclose(new_v, np.array([ 5.5,  2.20833333,  1.]))

Submission

You're done!

In order to turn in this assignment, submit this notebook to the Data 100 datahub at http://data100.datahub.berkeley.edu.

You will need to upload this notebook and any associated files to datahub manually if you have completed this assignment on your local machine. Detailed instructions for how to submit on Datahub can be found at http://www.ds100.org/sp18/materials.

Remember to click 'Validate' for this assignment before submitting. After clicking 'Submit', you can verify there is a time-stamped submission under 'Submitted assignments'.